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SIMILAR-PATTERN SEARCHING APPARATUS, SIMILAR-PATTERN 
SEARCHING METHOD, SIMILAR-PATTERN SEARCHING PROGRAM, AND 
FRACTION SEPARATING APPARATUS 



TECHNICAL FIELD 

[0001] The present invention relates to a similar- 

pattern searching apparatus, a similar-pattern searching 
10 method, a similar-pattern search program, and a fraction 

separating apparatus for searching a pattern having a high 
similarity to a pattern of a test sample from a group 
including a plurality of patterns. 

15 BACKGROUND ART 

[0002] Flow cytometry, for example, is a test method 

capable of clustering a leukocyte into neutrophils, 
lymphocytes, monocytes, acidophils, and the like within a 
short period of time. Leukocyte particle size data 

20 obtained by the flow cytometry can be classified into 

various particle size patterns according to a maturation or 
disease (see Nonpatent Literature 1). 

[0003] Many facilities have introduced this test as a 

daily screening test. However, mostly only clustered 

25 numerical data is used and the leukocyte particle size data 
generated in an analyzer is rarely used for clinical 
diagnosis . There are various reasons for this . For 
example, the leukocyte particle size data is huge to the 
extent that it cannot be handled by an external information 

30 system. In addition, raw analysis data is only visual 

searched and it is difficult to investigate the data by a 
scientific method . 

[0004] Considering these, the inventor of the present 
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patent application developed a clustering method based on a 
self-organizing map (SOP) using leukocyte particle size 
data obtained as a two-dimensional histogram (see Nonpatent 
Literatures 2 to 4) . This clustering method includes 
5 recording leukocyte particle size data in a database, and 
extracting characteristic patterns using the data mining 
technique, which enables classification that cannot be made 
based only on the two-dimensional histogram information. 
[0005] The conventional classification method is 

10 executed in the analyzer by using a fraction separating 
method with troughs of respective fractions set as 
boundaries. Each of the resultant fractions is used as one 
piece of numerical data for a diagnosis. However, in this 
method, the distribution of a plurality of proximate 

15 clusters, e.g., stab cells and segmented cells belonging to 
neutrophils or normal cells and juvenile cells, cannot be 
separated. 

[0006] Nonpatent Literature 1: Noriyuki TATSUMI , Izumi 

TSUDA, Takayuki TAKUBO et al . : "Practical Use of Automated 

20 White Cell Differential", HORIBA Technical Reports, No. 20, 
pp. 23-26, 2000. 

Nonpatent Literature 2: Hiromi KATAOKA, Hiromi 
IOKI, Osamu KONISHI, et al.: "Construction of Data Mining 
Assistance System for Leukocyte Particle Size Distribution" , 

25 Japanese Journal of Clinical Laboratory Automation (JJCLA) , 
Vol 27, 4, pp. 583, 2002. 

Nonpatent Literature 3: Hiromi KATAOKA, Hiromi 
IOKI, Osamu KONISHI, et al.: "Clustering and 3D 
visualization of Leukocyte Scattergrams " , Medical 

30 Informatics 22 (Suppl.), pp. 209-210, 2002. 

Nonpatent Literature 4 : Hiromi IOKI , Hiromi 
KATAOKA, Yuka KAWASAKI, et al . : "Clustering of Leukocyte 
Scattergram in Allergic Diseases", Medical Informatics 22 
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(Suppl. ) ,' pp. 211-212, 2002. 
DISCLOSURE OF INVENTION 

PROBLEM TO BE SOLVED BY THE INVENTION 
5 [0007] The present invention has been made to solve the 

problems in the conventional technology. It is an object 
of the present invention to provide a similar-pattern 
searching apparatus, a similar-pattern searching method, a 
similar-pattern search program, and a fraction separating 
10 apparatus capable of highly accurately doing a similarity 
search for a pattern having a high similarity to a pattern 
of a test sample from a group including a plurality of 
patterns, and providing information useful for a diagnosis. 

15 MEANS FOR SOLVING PROBLEM 

[0008] To solve the above problems and to achieve the 

objects, according to the invention of claim 1, a similar- 
pattern searching apparatus for searching a pattern having 
a high similarity to a pattern of a test sample from a 

20 group including a plurality of patterns includes a storage 
unit that stores a class map generated by selecting model 
parameters that characterize a plurality of component 
fractions included in each of the patterns, and by 
clustering the patterns; and a similar-pattern searching 

25 unit that selects a class similar to a component fraction 
included in the pattern of the test sample from the class 
map . 

[0009] According to the invention of claim 1, a 

plurality of patterns are clustered to generate the class 
30 map using the model parameters that characterize a 

plurality of component fractions included in each .of the 
patterns. The class similar to the component fraction 
included in the pattern of the test sample is selected from 
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the class map, and subjected to a highly accurate similar 
search . 

[0010] According to the invention of claim 2, the 

patterns are one-dimensional or multi-dimensional patterns. 
5 According to the invention of claim 2, the one-dimensional 
or multi-dimensional patterns are subjected to the highly 
accurate similarity search. 

[0011] According to the invention of claim 3, the 

patterns are leukocyte particle size patterns, protein 

10 electrophoretic waveforms, or blood cell histograms. 
According to the invention of claim 3, the leukocyte 
particle size patterns, the protein electrophoretic 
waveforms, or the blood cell histograms are subjected to 
the highly accurate similarity search. 

15 [0012] According to the invention of claim 4, a similar- 

pattern searching method of searching a pattern having a 
high similarity to a pattern of a test sample from a group 
including a plurality of patterns includes a class-map 
generating step of selecting model parameters that 

20 characterize a plurality of component fractions included in 
each of the patterns, clustering the patterns, and 
generating a class map; a storage step of storing the class 
map generated at the class-map generating step; and a 
similar-pattern searching step of selecting a class similar 

25 to a component fraction included in the pattern of the test 
sample from the class map. 

[0013] According to the invention of claim 4, a 

plurality of patterns are clustered to generate the class 
map using the model parameters that characterize a 
30 plurality of component fractions included in each of the 
patterns. The class similar to the component fraction 
included in the pattern of the test sample is selected from 
the class map, and subjected to a highly accurate similar 
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search . 

[0014] According to the invention of claim 5, a similar- 

pattern search program that realizes on a computer a 
similar-pattern searching method of searching a pattern 
5 having a high similarity to a pattern of a test sample from 
a group including a plurality of patterns, causes the 
computer to execute a class-map generating process of 
selecting model parameters that characterize a plurality of 
component fractions included in each of the patterns, 

10 clustering the patterns, and generating a class map; a 

storage process of storing the class map generated at the 
class-map generating process; and a similar pattern search 
step of selecting a class similar to a component fraction 
included in the pattern of the test sample from the class 

1 5 map . 

[0015] According to the invention of claim 5, a 
plurality of patterns are clustered to generate the class 
map using the model parameters that characterize a 
plurality of component fractions included in each of the 
20 patterns. The class similar to the component fraction 

included in the pattern of the test sample is selected from 
the class map, and subjected to a highly accurate similar 
search . 

[0016] According to the invention of claim 6, a similar- 

25 pattern searching apparatus for searching a leukocyte 
particle size pattern having a high similarity to a 
leukocyte particle size pattern of a test sample from a 
group including a plurality of leukocyte particle size 
patterns, each of the leukocyte particle size patterns 
30 including a plurality of cellular component fractions 
includes a primary clustering unit that clusters the 
leukocyte particle size patterns obtained by a measurement 
while applying a self -organizing map to the leukocyte 
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particle size patterns, and that generates a primary class 
map; a first-parameter determining unit that executes an EM 
algorithm for the respective leukocyte particle size 
patterns included in the primary class map using 
5 predetermined initial values, thereby determining first- 
mixture-distribution model parameters including the number 
of cellular components contained in each of the patterns, 
and an average, a variance, and a density of each of the 
cellular components; a second-parameter determining unit 

10 that executes the EM algorithm for the respective leukocyte 
particle size patterns using the first-mixture-distribution 
model parameters as the initial values, thereby determining 
second mixture distribution model parameters including the 
number of the cellular components contained in each of the 

15 leukocyte particle size patterns, and the average, the 
variance, and the density of each cellular component; a 
secondary clustering unit that clusters the respective 
leukocyte particle size patterns while applying the self- 
organizing map to the first mixture distribution model 

20 parameters, and that generates a secondary class map; an 
inter-class distance master generator that calculates 
similarity distances between all combinations of respective 
classes included in the secondary class map, and that 
generates an inter-class distance master in which the 

25 combinations of the classes correspond to the respective 
inter-class similarity distances; a storage unit that 
stores the secondary class map and the inter-class distance 
master; a class determining unit that determines a class 
belonging to each of cellular component fractions included 

30 in the leukocyte particle size pattern of the test sample 
from the secondary class map; and a similar-pattern 
searching unit that detects, as a similar class, a class 
which similarity distance from the class determined by the 
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class determining unit is equal to or smaller than a 
predetermined threshold, from the inter-class distance 
master, and that determines a leukocyte particle size 
pattern included in the similar class as the pattern having 
5 the high similarity to the leukocyte particle size pattern 
of the test sample. 

[0017] According to the invention of claim 6, the 

respective components included in each of the leukocyte 
particle sizes are separated by the EM algorithm using the 

10 initial values determined by using the self-organizing map. 
In addition, the leukocyte particle size patterns are 
clustered again using the self-organizing map. The 
secondary class map and the inter-class distance master are 
thereby constructed . 

15 [0018] According to the invention of claim 7, a similar- 

pattern searching method of searching a leukocyte particle 
size pattern having a high similarity to a leukocyte 
particle size pattern of a test sample from a group 
including a plurality of leukocyte particle size patterns, 

20 each of the leukocyte particle size patterns including a 
plurality of cellular component fractions includes a 
primary clustering step of clustering the leukocyte 
particle size patterns obtained by a measurement while 
applying a self-organizing map to the leukocyte particle 

25 size patterns, and generating a primary class map; a first- 
parameter determining step of executing an EM algorithm for 
the respective leukocyte particle size patterns included in 
the primary class map using predetermined initial values, 
thereby determining first-mixture-distribution model 

30 parameters including the number of cellular components 
contained in each of the patterns, and an average, a 
variance, and a density of each of the cellular components ; 
a second-parameter determining step of executing the EM 



algorithm for the respective leukocyte particle size 
patterns using the first-mixture-distribution model 
parameters as the initial values , thereby determining 
second mixture distribution model parameters including the 
5 number of the cellular components contained in each of the 
leukocyte particle size patterns, and the average, the 
variance, and the density of each cellular component; a 
secondary clustering step of clustering the respective 
leukocyte particle size patterns while applying the self- 

10 organizing map to the first mixture distribution model 

parameters, and generating a secondary class map; an inter- 
class distance master generating step of calculating 
similarity distances between all combinations of respective 
classes included in the secondary class map,, and generating 

15 an inter-class distance master in which the combinations of 
the classes correspond to the respective inter-class 
similarity distances; a storing step of storing the 
secondary class map and the inter-class distance master; a 
class determining step of determining a class belonging to 

20 each of cellular component fractions included in the 

leukocyte particle size pattern of the test sample from the 
secondary class map; and a similar-pattern searching step 
of detecting, as a similar class, a class which similarity 
distance from the class determined at the class determining 

25 step is equal to or smaller than a predetermined threshold, 
from the inter-class distance master, and determining a 
leukocyte particle size pattern included in the similar 
class as the pattern having the high similarity to the 
leukocyte particle size pattern of the test sample. 

30 [0019] According to the invention of claim 7, the 

respective components included in each of the leukocyte 
particle sizes are separated by the EM algorithm using the 
initial values determined by using the self-organizing map. 



In addition, the leukocyte particle size patterns are 
clustered again using the self-organizing map. The 
secondary class map and the inter-class distance master are 
thereby constructed . 
5 [0020] According to the invention of claim 8, a similar- 

pattern search program that realizes on a computer a 
similar-pattern searching method of searching a leukocyte 
particle size pattern having a high similarity to a 
leukocyte particle size pattern of a test sample from a 

10 group including a plurality of leukocyte particle size 
patterns, each of the leukocyte particle size patterns 
including a plurality of cellular component fractions, 
causes the computer to execute a primary clustering process 
of clustering the leukocyte particle size patterns obtained 

15 by a measurement while applying a self -organizing map to 
the leukocyte particle size patterns, and generating a 
primary class map; a first-parameter determining process of 
executing an EM algorithm for the respective leukocyte 
particle size patterns included in the primary class map 

20 using predetermined initial values, thereby determining 

first-mixture-distribution model parameters including the 
number of cellular components contained in each of the 
patterns, and an average, a variance, and a density of each 
of the cellular components; a second-parameter determining . 

25 process of executing the EM algorithm for the respective 
leukocyte particle size patterns using the first-mixture- 
distribution model parameters as the initial values, 
thereby determining second mixture distribution model 
parameters including the number of the cellular components 

30 contained in each of the leukocyte particle size patterns, 
and the average, the variance, and the density of each 
cellular component; a secondary clustering process of 
clustering the respective leukocyte particle size patterns 
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while applying the self-organizing map to the first mixture 
distribution model parameters, and generating a secondary 
class map; an inter-class distance master generating 
process of calculating similarity distances between all 
5 combinations of respective classes included in the 

secondary class map, and generating an inter-class distance 
master in which the combinations of the classes correspond 
to the respective inter-class similarity distances; a 
storing process of storing the secondary class map and the 

10 inter-class distance master; a class determining process of 
determining a class belonging to each of cellular component 
fractions included in the leukocyte particle size pattern 
of the test sample from the secondary class map; and a 
similar-pattern searching process of detecting, as a 

15 similar class, a class which similarity distance from the 
class determined at the class determining process is equal 
to or smaller than a predetermined threshold, from the 
inter-class distance master, and determining a leukocyte 
particle size pattern included in the similar class as the 

20. pattern having the high similarity to the leukocyte 
particle size pattern of the test sample. 
[0021] According to the invention of claim 8, the 

respective components included in each of the leukocyte 
particle sizes are separated by the EM algorithm using the 

25 initial values determined by using the self-organizing map. 
In addition, the leukocyte particle size patterns are 
clustered again using the self -organizing map. The 
secondary class map and the inter-class distance master are 
thereby constructed. 

30 [0022] According to the invention of claim 9, a 

cellular- component- fraction separating apparatus for 
separating a plurality of cellular component fractions 
included in a leukocyte particle size pattern includes a 
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primary clustering unit that clusters a plurality of 
leukocyte particle size patterns, which are obtained by 
measurement, while applying a self-organizing map to the 
leukocyte particle size patterns, and that generates a 
5 primary class map; a parameter determining unit that 
executes an EM algorithm for the respective leukocyte 
particle size patterns included in the primary class map 
using predetermined initial values, thereby determining 
mixture distribution model parameters including the number 

10 of cellular components contained in each of the patterns, 
and an average, a variance, and a density of each of the 
cellular component fractions; and a fraction separating 
unit that executes the EM algorithm for the respective 
leukocyte particle size patterns using the mixture 

15 distribution model parameters as the initial values, 
thereby separating the cellular component fractions 
included in each of the leukocyte particle size patterns. 
[0023] According to the invention of claim 9, self- 

organizing map (SOM) is applied to determine the initial 

20 values of the EM algorithm. 

EFFECT OF THE INVENTION 

[0024] The similar-pattern searching apparatus according 

to the present invention (claim 1) clusters a plurality of 

25 patterns to generate the class map using the model 

parameters that characterize a plurality of component 
fractions included in each of the patterns. In addition, 
the apparatus selects the class similar to the component 
fraction included in the pattern of the test sample from 

30 the class map. It is, therefore, advantageously possible 
to highly accurately search a pattern having a high 
similarity to the pattern of the test sample from the group 
including a plurality of patterns. In addition, it is 
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advantageously possible to provide information useful for a 
diagnosis . 

[0025] The similar-pattern searching apparatus according 

to the present invention (claim 2) uses the one-dimensional 
5 or multi-dimensional patterns as the patterns. It is, 
therefore, advantageously possible to highly accurately 
search a pattern having a high similarity to the one- 
dimensional or multi-dimensional pattern of the test sample. 
[0026] The similar-pattern searching apparatus according 

10 to the present invention (claim 3) uses the leukocyte 
particle size patterns, the protein electrophoretic 
waveforms, or the blood cell histograms as the patterns. 
It is, therefore, advantageously possible to highly 
accurately search a pattern having a high similarity to the 

15 leukocyte particle size pattern or a pattern of the protein 
electrophoretic waveforms or the blood cell histograms. 
[0027] With the similar-pattern searching method 

according to the present invention (claim 4) a plurality of 
patterns are clustered to generate the class map using the 

20 model parameters that characterize a plurality of component 
fractions included in each of the patterns. In addition, 
the class similar to the component fraction included in the 
pattern of the test sample is selected from the class map. 
It is, therefore, advantageously possible to highly 

25 accurately search a pattern having a high similarity to the 
pattern of the test sample from the group including a 
plurality of patterns. In addition, it is advantageously 
possible to provide information useful for a diagnosis. 
[0028] The similar-pattern search program according to 

30 the present invention (claim 5) clusters a plurality of 
patterns to generate the class map using the model 
parameters that characterize a plurality of component 
fractions included in each of the patterns. In addition, 
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the similar-pattern search program selects the class 
similar to the component fraction included in the pattern 
of the test sample from the class map. It is, therefore, 
advantageously possible to highly accurately search a 
5 pattern having a high similarity to the pattern of the test 
sample from the group including a plurality of patterns. 
In addition, it is advantageously possible to provide 
information useful for a diagnosis. 

[0029] The similar-pattern searching apparatus according 

10 to the present invention (claim 6) separates the respective 
components included in each of the leukocyte particle sizes 
by the EM algorithm using the initial values determined by 
using the self-organizing map. In addition, the apparatus 
clusters again the leukocyte particle size patterns using 

15 the self-organizing map. The apparatus thereby constructs 

the secondary class map and the inter-class distance master. 
It is, therefore, advantageously possible to arbitrarily 
select the similarities of the search target. 
[0030] Conventionally, the particle size data of two- 

20 dimensional histograms is directly used to perform 

clustering using self-organizing map (SOM) . Due to this, a 
similarity search with attention paid to partial 
similarities of interest cannot be done for the respective 
components of the leukocyte. According to the present 

25 invention, by performing a mixture density approximation 

using an EM algorithm to separate the respective components, 
and further by clustering characteristic parameters of 
respective fractions, it is possible to perform the 
similarity search with attention paid to the distribution 

30 pattern of the cell group of interest. 

[0031] With the similar-pattern searching method 

according to the present invention (claim 7) , the 
respective components included in each of the leukocyte 
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particle sizes are separated by the EM algorithm using the 
initial values determined by using the self-organizing map. 
In addition, the leukocyte particle size patterns are 
clustered again using the self -organizing map. The 
5 secondary class map and the inter-class distance master are 
thereby constructed. It is, therefore, advantageously 
possible to arbitrarily select the similarities of the 
search target. 

[0032] The similar-pattern search program according to 

10 the present invention (claim 8) separates the respective 

components included in each of the leukocyte particle sizes 
by the EM algorithm using the initial values determined by 
using the self-organizing map. In addition, the similar- 
pattern search program clusters again the leukocyte 

15 particle size patterns using the self-organizing map. The 
similar-pattern search program thereby constructs the 
secondary class map and the inter-class distance master. 
It is, therefore, advantageously possible to arbitrarily 
select the similarities of the search target. 

20 [0033] The fraction separating apparatus according to 

the present invention (claim 9) applies the self -organizing 
map (SOM) to determination of the initial values of the EM 
algorithm. It is, therefore, advantageously possible to 
solve the problem of convergence of the marginal likelihood 

25 on the local maximum. 

BRIEF DESCRIPTION OF DRAWINGS 

[0034] Fig. 1 is a block diagram of a similar-pattern 

searching apparatus 1 according to an embodiment of the 
30 present invention; 

Fig. 2 is a flowchart of a process performed by the 
similar-pattern searching apparatus 1; 

Fig. 3 is an example a primary class map obtained as a 
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result of primary clustering using a self-organizing map 
(SOM) ; 

Fig. 4 depicts a two-dimensional histogram of original 
particle size data (upper view) , and, depicts a redrawn and 
5 modeled two-dimensional histogram obtained by combining 
respective fraction components using obtained mixture 
distribution parameters (lower view) ; 

Fig. 5 is an example a secondary class map obtained as 
a result of clustering respective mixture distribution 
10 model parameters obtained by an EM algorithm using the SOM; 

Fig. 6 is a distribution chart of stab cells and 
segmented cells distributed in a neutrophil region; 

Fig. 7 is an enlarged view of a segmented cell 
distribution based on Class 351; 
15 Fig. 8 is a chart of plotting distances of respective 

classes from Class 801 of acidophils; 

Fig. 9 is an example of a primary class map obtained 
as a result of primary clustering of protein 
electrophoretic waveforms using the SOM; 
20 Fig. 10 is an example of a primary class map obtained 

as a result of primary clustering of blood cell histograms 
using the SOM; and 

Fig. 11 depicts one embodiment of the present 
invention . 

25 

EXPLANATIONS OF LETTERS OR NUMERALS 
[0035] 

1 Similar pattern search apparatus 
11 Primary clustering unit 
30 12 First parameter determining unit 

13 Second parameter determining unit 

14 Secondary clustering unit 

15 Inter-class distance master generator 
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16 Memory 

17 Class determining unit 

18 Similar pattern search unit 
2 Analyzer 

5 3 External input and output apparatus 

BEST MODE(S) FOR CARRYING OUT THE INVENTION 

[0036] Exemplary embodiments of a similar-pattern 

searching apparatus, a similar-pattern searching method, a 

10 similar-pattern search program, and a fraction separating 
apparatus according to the present invention will be 
explained hereinafter in detail with reference to the 
accompanying drawings. The present invention is not 
limited to the embodiments. Constituent elements in the 

15 embodiments below include elements that persons skilled in 
the art can easily assume or that are substantially the 
same. While a leukocyte particle size pattern is explained 
as an example in the embodiments, the present invention is 
not limited thereto. 

20 [0037] After performing intensive research, the inventor 

of the present patent application discovered that a 
similarity search can be done with attention paid to a 
distribution pattern of a cell group of interest by 
performing a mixture density approximation on cellular 

25 components contained in a leukocyte particle size pattern 

using an EM algorithm to separate the respective components, 
and further by clustering characteristic parameters of 
respective fractions. Based on this knowledge, the 
inventor eventually made the present invention. 

30 [0038] Generally, the EM algorithm has the following 

problems. A convergent point largely depends on initial 
conditions and a local maximum of a marginal likelihood is 
unavoidable. Namely, a phenomenon of convergence of the 
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marginal likelihood on a low-level local solution occurs 
depending on initial values. The embodiments of the 
present invention are intended to solve the problem of the 
convergence of the marginal likelihood on the local maximum 
5 by calculating initial values of respective classes based 
on a result of clustering entire leukocyte particle size 
data by self-organizing map (SOM) in advance. According to 
the embodiments, an algorithm capable of doing a high speed 
similarity search from general viewpoints such as a search 

10 for the respective cellular components of the leukocyte or 
a combination of the respective components is developed. 
In addition, information useful for a diagnosis is provided. 
[0039] An embodiment of the present invention will be 

explained hereinafter. Fig. 1 is a block diagram of a 

15 similar-pattern searching apparatus 1 according to the 

embodiment of the present invention. The similar-pattern 
searching apparatus 1 includes a primary clustering unit 11, 
a first parameter determining unit 12, a second parameter 
determining unit 13, a secondary clustering unit 14, an 

20 inter-class distance master generator 15, a storage unit 16, 
a class determining unit 17, and a similar-pattern 
searching unit 18. 

[0040] According to the present embodiment, respective 
components can be separated by performing a mixture density 

25 approximation using the EM algorithm, and a similarity 

search with attention paid to a distribution pattern of a 
cell group of interest can be performed by clustering 
characteristic parameters of each of resultant fractions. 
[0041] The EM algorithm is constituted by two processing 

30 algorithms, i.e., an Expectation step (E-step) and a 

Maximization step (M-step) . By repeating operations these 
two steps by changing the parameters until a convergence is 
obtained a maximal point of a maximum likelihood estimated 
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values can be acquired. The E-step includes calculating 
conditional expectation values of a logarithmic likelihood, 
while the M-step includes maximizing the conditional 
expectation values . 
5 A dataset and an approximation model used in this 

embodiment are as follows. 

Data type: Two-dimensional histograms 

Model: Normal mixture model 

Parameters: Average, variance, and density 

10 [0042] Generally, the EM algorithm has the following 

problems. A convergent point largely depends on initial 
conditions and a local maximum of a marginal likelihood is 
unavoidable. Namely, the phenomenon of convergence of the 
marginal likelihood on a low-level local solution occurs 

15 depending on initial values. The embodiments of the 

present invention are intended to solve the problem of the 
convergence of the marginal likelihood on the local maximum 
by calculating initial values of respective classes based 
on a result of clustering entire leukocyte particle size 

20 data by the SOM in advance. 

[0043] An analyzer 2 measures data on the two- 

dimensional histograms of leukocyte particle sizes. The 
similar-pattern searching apparatus 1 obtains the data from 
the analyzer 2 and stores it in the storage unit 16. 

25 [0044] The primary clustering unit 11 clusters patterns 

of a plurality of leukocyte particle size patterns obtained 
by a measurement while applying the SOM to the patterns, 
and generate a primary class map. 

[0045] The first parameter determining unit 12 executes 

30 the EM algorithm using predetermined initial values for the 
respective patterns included in the primary class map. The 
first parameter determining unit 12 thereby determines 
first mixture distribution model parameters constituted by 
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the number of cellular components contained in each 
leukocyte particle size pattern, and an average, a variance, 
and a density of each cellular component. 

[0046] The second parameter determining unit 13 executes 

5 the EM algorithm for the respective measured leukocyte 

particle size patterns with the first mixture distribution 
model parameters assumed as the initial values. The second 
parameter determining unit 13 thereby determines second 
mixture distribution model parameters constituted by the 
10 number of cellular components contained in each leukocyte 
particle size pattern, and an average, a variance, and a 
density of each cellular component. 

[0047] The secondary clustering unit 14 clusters the 

second mixture distribution model parameters while applying 
15 the SOC to the patterns , and generates a secondary class 
map. In the present embodiment, the SOM is used for the 
clustering. Alternatively, K-mean clustering or the other 
clustering can be performed. 

[0048] The inter-class distance master generator 15 

20 calculates similarity distances between all combinations of 
the classes included in the secondary class map. In 
addition, the inter-class distance master generator 15 
generates an inter-class master in which the combinations 
of classes correspond to the respective inter-class 
25 distances. 

[0049] The storage unit 16 stores therein such data as 

the two-dimensional histogram data on the leukocyte 
particle size patterns measured by the analyzer 2, the 
secondary class map data generated by the secondary 
30 clustering unit 14, and the inter-class distance master 

data generated by the inter-class distance master generator 
15. 

[0050] The class determining unit 17 determines a class 
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belonging to each cellular component fraction contained in 
a leukocyte particle size pattern of a test sample from the 
secondary class map . 

[0051] The similar-pattern searching unit 18 detects, as 

5 a similar class, a class, of which similarity distance from 
the class determined at the class determination step is 
equal to or smaller than a threshold, from the inter-class 
distance master. In addition, the similar-pattern 
searching unit 18 determines a leukocyte particle size 

10 pattern included in the similar class as a pattern having a 
high similarity to the leukocyte particle size pattern of 
the test sample. In the present embodiment, in determining 
the similarity, the inter-class distance is used. However, 
a similarity evaluation criterion (cluster evaluation 

15 criterion) is not limited to the inter-class distance. 

Alternatively, a distance from the center of gravity of a 
cluster, an intra-cluster distance or the like can be used. 
[0052] An external input and output apparatus 2 

transmits user-input parameters, similar pattern search 

20 conditions and the like input to the similar-pattern 

searching apparatus 1. In addition, the external input and 
output apparatus 3 outputs the similar patterns hit in the 
similar-pattern searching apparatus 1 on a screen. 
[0053] Fig. 2 is a flowchart of a process performed by 

25 the similar-pattern searching apparatus 1 according to the 
present embodiment. It is noted that an instance of 
processing data of two-dimensional histograms 128x128 for 
LMNE channels of 8,800 ordinary patient test samples 
analyzed by an automatic blood cell counter PENTRA120 

30 (manufactured by HORIBA, Ltd.) with eight bits per test 
sample will be explained as a specific example of the 
process . 

[0054] (1) Generation of the primary class map for 
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determination of initial values 

The similar-pattern searching apparatus 1 
clusters data obtained by subjecting the two-dimensional 
histogram data output from the analyzer 2 to eight- 
5 neighboring-point smoothing using a SOM including input 
layers 128x128 (16,384 neurons) and competitive layers 
12x12 (units), thereby obtaining 144 patterns. The 
similar-pattern searching apparatus 1 determines the 144 
patterns as the primary class map. As learning parameters 

10 for the SOM, a neighboring distance is set to four and a 

learning rate is set to 0.3. Furthermore, for each pattern 
on this primary class map, 4x4 or 16 divided regions are 
set, and the center of gravity of a two-dimensional 
histogram of each region is calculated. Using the centers 

15 of gravities as initial values, a mixture model is 

separated by the EM algorithm. The calculation is made on 
assumption that a distribution model of each fraction is a 
normal distribution. Obtained mixture distribution model 
parameters (the number of components, and an average, a 

20 variance, and a density of each component) are artificially 
adjusted. Temporary parameters are thereby determined. 
[0055] (2) Mixture distribution approximation by the EM 

algorithm 

The mixture distribution approximation by the EM 
25 algorithm can be carried out using a technique explained in 
Sumio Watanabe : Data Learning Algorithm, Kyoritu Shuppan 
Co., Ltd., 2001; Igor V. Cadez , Scott Gaffney, Padharaic 
Smyth: "A General Probabilistic Framework for Clustering 
Individuals and Objects", Knowledge Discovery and Data 
30 Mining, pp. 140-149, 2000, or the like. 

[0056] Specifically, a class having a highest similarity 

to the primary map is searched from the two-dimensional 
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histogram data on each test sample. The EM algorithm is 
executed using mixture distribution model parameters for 
the searched class, thereby separating the particle size 
components. The same process is executed for particle size 
5 data on all test samples and individual mixture 
distribution model parameters are calculated. 
[0057] (3) The secondary class map using the SOC with 

mixture distribution parameters used as inputs can be 
generated by using a technique explained in Tom Heskes : 
10 "Self-organizing maps, vector quantization, and mixture 
modeling", IEEE Transactions on Neural Networks, 12, pp. 
1299-1305, 2001. or the like. 

[0058] The data is clustered while adding, to each input 

layer, actual number mixture distribution model parameters 

15 constituted by six parameters, i.e., an X average, a Y 

average, a Y covariance matrix, an XY covariance matrix, 
and a density, using a SOM having competitive layers 30x30 
(units), a neighboring distance of 10, and a learning rate 
of 0.3. This clustering result is used as a secondary 

20 class map for a similarity search. At this time, 

similarity distances between all combinations of the 
classes are calculated and registered in the inter-class 
distance master. 

[0059] (4) Similarity search 

25 A class belonging to each fraction of each test 

sample is obtained from the secondary class map, the inter- 
class distance master is read, and a threshold is 
determined according to a purpose of the search. Class 
groups that coincide with the conditions are searched. By 

30 making the threshold variable, an intensity of the 

similarity in the search can be arbitrarily selected. 
Furthermore, by searching the class group in a region 
included in the threshold under disjunction conditions, the 
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similarity search is realized. To search a general pattern 
of the respective fractions, the search is done by 
conjunction of classes belonging to the respective 
fractions . 

5 [0060] Fig. 3 is a view of a result of primary 

clustering using the SOM. Fig. 3 displays an inside of 
12x12 competitive layers. A result of clustering the 
entire leukocyte particle size patterns to 144 clusters is 
obtained . 

10 [0061] An upper view of Fig. 4 depicts a two-dimensional 

histogram of original particle size data, where symbol + 
indicates an initial value and a symbol x indicates a path 
on which the most likelihood search is executed by the EM 
algorithm and a convergent point. A lower view of Fig. 4 

15 depicts a redrawn and modeled two-dimensional histogram 
obtained by combining the respective fraction components 
using the obtained mixture distribution parameters. 
[0062] Fig. 5 is a result of clustering the respective 

mixture distribution model parameters obtained by the EM 

20 algorithm using the SOM. An elliptic component drawn in 
red indicates a cellular component fraction. Similar 
patterns are arranged around the elliptic component. As 
can be understood from Fig. 5, various patterns are present 
for the respective cell groups. Distributions of 

25 lymphocytes, monocytes, neutrophils, and acidophils are 

obtained in a pink region 1, a yellow region 2, a sky blue 
region 3, and a purple region 4, respectively. The cells 
are clustered into four cell groups literally LMNE channels. 
Furthermore, blood platelets are mapped on a white region 

30 distributed below the lymphocytes, distributions considered 
to be abnormal cells are mapped on boundary regions between 
the other white regions and the respective cell groups. 
The cell groups shown in Figs. 5 and 6 will be referred to 
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by their sequential numbers in raster direction with an 
upper left corner defined as Class 0 and a lower right 
corner defined as Class 899. 

[0063] Fig. 6 is a distribution chart of stab cells and 
5 segmented cells distributed in the neutrophil region. 
According to a result of visual classification by a 
microscope, Class 120 is a class including more stab cells 
than any other class and Class 351 is a class including 
more segmented cells than any other class. A yellow 

10 gradation region 31 (left) shows a distribution of 

expressing similarity distance patterns centering around 
the Class 120 including the most stab cells conspicuously 
moving leftward by color intensities. In addition, a blue 
gradation region 32 (right) shows patterns centering around 

15 the Class 351 including the most segmented cells. 

[0064] Fig. 7 is an enlarged view of the segmented cell 

distribution based on the Class 351. To do a similarity 
search in a wide range, the classes in a region surrounded 
by a red line are searched. To do a search on high 

20 similarity cells, classes in a region surrounded by a green 
line or a blue line are searched. By doing so, search 
targets can be narrowed down. 

[0065] Visual boundary surfaces of the stab cells (Class 

120) and the segmented cells (Class 351) shown in Fig. 6 

25 are combined by smooth gradations, which indicates that a 
similarity boundary is unclear. This suggests that the 
stab cells and the segmented cells equally belong to 
neutrophils and similarities can be, therefore, clustered 
on the map according to differentiation degrees of cells. 

30 On the other hand, clear boundary surfaces with fewer 

gradations are observed on a boundary between the segmented 
cells in the Class 351 and the lymphocyte region. This 
suggests that these cell groups can be clearly separated on 
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the map. Fig. 8 is a chart of plotting distances of the 
respective classes from the Class 801 of the acidophils. 
In Fig. 8, the vertical axis indicates the distance from 
the Class 801 and the horizontal axis indicates classes 
5 sorted in an ascending order of distance. As can be seen 
from Fig. 8, acidophils are distributed in a range at a 
distance equal to or smaller than one, and similarities of 
search targets can be changed by making a threshold of the 
distance variable. In addition, an interesting result is 

10 obtained that a stepped curve is obtained for the 

respective cell groups and that the stab cells and the 
segmented cells of the neutrophils, are divided by the 
monocyte. This has a tendency that various patterns are 
obtained according to the cells based on which the 

15 distances of the cells are plotted. 

[0066] The similarity search system capable of 

arbitrarily changing a similarity criterion for 
similarities of the individual components of the leukocyte 
or a combined similarity of the components is constructed. 

20 In the EM algorithm, the initial values are determined by 
the patterns clustered using the SOM in advance, whereby a 
correct convergence result is obtained. The stab cells 
cannot be separated from the segmented cells by the flow 
cytometry in a clinical test region. However, they can be 

25 easily separated by using the method according to the 

embodiment, and the system that provides useful information 
for the diagnosis and treatment can be constructed. 
[0067] One embodiment of the present invention has been 

explained in detail with reference to the drawings. 

30 However, specific examples of the configuration are not 
limited to this embodiment. Even design changes and the 
like within the scope of the concept of the present 
invention are also included in the present invention. 
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[ 00 68 ] For example , in the embodiment , the similar- 

pattern searching apparatus 1 searches the similarities of 
the leukocyte particle size patterns. However, the 
similar-pattern searching apparatus 1 can be configured to 
5 search similarities of test sample patterns such as one- 
dimensional protein electrophoretic waveforms and blood 
cell histograms. The similar-pattern searching apparatus 1 
thus can be configured to search the similarities of 
various types of test sample patterns. In addition, the 

10 test sample pattern is not limited to the two-dimensional 

information such as the leukocyte particle size pattern but 
can be one-dimensional information or multi-dimensional 
information (including a time axis). Fig. 9 is an example 
of a primary class map obtained as a result of allowing the 

15 similar-pattern searching apparatus 1 to cluster the 

protein electrophoretic waveforms using the SOM. Fig. 10 
is an example of a primary class map obtained as a result 
of allowing the similar-pattern searching apparatus 1 to 
cluster the blood cell histograms using the SOM. 

20 [0069] Furthermore, in the embodiment, a computer 

program for realizing the functions of the similar-pattern 
searching apparatus 1 can be recorded in a computer- 
readable recording medium 60 shown in Fig. 11. In addition, 
the respective functions of the similar-pattern searching 

25 apparatus 1 can be realized by allowing a computer 50 shown 
in Fig. 11 to read and execute the computer program in the 
recording medium 60. 

[0070] The computer 50 shown in Fig. 11 includes a CPU 

(central processing unit) 51 that executes the computer 
30 program, an input device 52 such as a keyboard and a mouse, 
a ROM (read only memory) 53, a RAM (random access memory) 

54 that stores operation parameters and the like, a reader 

55 that reads the computer program from the recording 
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medium 60 , and an output device 56 such as a display and a 
printer . 

[0071] The CPU 51 reads the computer program recorded in 

the recording medium 60 with the help of the reader 55 and 
5 then executes the computer program, thereby realizing the 
functions of the similar-pattern searching apparatus 1 . 
Examples of the recording medium 60 include an optical disk, 
a flexible disk, a hard disk, and the like. 



10 INDUSTRIAL APPLICABILITY 

[0072] As explained so far, the similar-pattern 
searching apparatus according to the present invention can 
arbitrarily change the similarity criterion for the 
similarities with which the respective components are 

15 integrated. Therefore, information useful for diagnosis 
and treatment can be provided. 



